Attention based Sentence Extraction from Scientific Articles using Pseudo-Labeled data

نویسندگان

  • Parth Mehta
  • Gaurav Arora
  • Prasenjit Majumder
چکیده

In this work, we present a weakly supervised sentence extraction technique for identifying important sentences in scientific papers that are worthy of inclusion in the abstract. We propose a new attention based deep learning architecture that jointly learns to identify important content, as well as the cue phrases that are indicative of summary worthy sentences. We propose a new context embedding technique for determining the focus of a given paper using topic models and use it jointly with an LSTM based sequence encoder to learn attention weights across the sentence words. We use a collection of articles publicly available through ACL anthology for our experiments. Our system achieves a performance that is better, in terms of several ROUGE metrics, as compared to several state of art extractive techniques. It also generates more coherent summaries and preserves the overall structure of the document.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision

Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet...

متن کامل

Scientific Achievements of Medical Journals in Occupational Accidents

Background: Occupational accidents are the second cause of occupational fatality in Iran and are among the major health, social, and economic risk factors. Since the publication of scientific articles in the field of occupational accidents reflects the concern of researchers to this important issue, the present study aimed to evaluate the scientific achievements in the field of occupational acc...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Recognizing entailments in legal texts using sentence encoding-based and decomposable attention models

This paper presents an end-to-end question answering system for legal texts. This system includes two main phases. In the first phase, our system will retrieve articles from Japanese Civil Code that are relevant with the given question using the cosine distance after the given question and articles are converted into vectors using TF-IDF weighting scheme. Then, a ranking model can be applied to...

متن کامل

Analysis of Scientific Publications in the Field of Ethics in Accounting

Background: Scientific articles represent the efforts of researchers and are useful and valuable source of information and can be taken as a basis for scientific and performance analysis. The purpose of this research is to study the scientific production of the subject area of ethics in accounting. Method: This descriptive-analytical research examined 145 articles of the subject area of ethics ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.04675  شماره 

صفحات  -

تاریخ انتشار 2018